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Abstract 

We consider Bayesian multivariate density estimation using a Dirich- 
let mixture of normal kernel as the prior distribution. By representing a 
Dirichlet process as a stick-breaking process, we are able to extend con- 
vergence results beyond finitely supported mixtures priors to Dirichlet 
mixtures. Thus our results have new implications in the univariate situa- 
tion as well. Assuming that the true density satisfies Holder smoothness 
and exponential tail conditions, we show the rates of posterior conver- 
gence are minimax-optimal up to a logarithmic factor. This procedure is 
fully adaptive since the priors are constructed without using the knowl- 
edge of the smoothness level. 

1 Introduction 

Kernel methods for density estimation has been well studied in the past fifty 
years ([25]). In the nonparametric Bayesian literature, the study of asymptotic 
properties of posterior distributions received a lot of interest since the develop- 
ment of efficient Markov chain Monte Carlo (MCMC) methods ([6j and |19j). 
A general result on posterior consistency was established in and [23] and 
then applied on the univariate Dirichlet mixture of normal prior. General pos- 
terior convergence rate theorems were obtained in [8] and [22]. Ghosal and van 
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der Vaart [ID] considered univariate Bayesian density estimation problem using 
Dirichlet mixture of normal kernel and studied the case when the true density 
is a location- scale mixture type while its standard deviation is bounded away 
from zero and infinity. Although the posterior rate is nearly the parametric 
rate the assumption of "super smooth"true density with the bounded 

range of standard deviation is quite restrictive. Using a new general rate the- 
orem, Ghosal and van der Vaart [llj obtained posterior convergence rate of 
univariate Dirichlet mixture of normal kernel when the true density is only 
twice continuously differentiable. Though the number of mixture components 
increases, the minimax rate is still obtained. These results need a prior on the 
bandwidth parameter that scales appropriately with increasing sample size. 

In recent studies, rate-adaptive estimators based on posterior distributions 
have been constructed to accommodate different levels of smoothness of the 
underlying true function of interest. Belitser and Ghosal pQ considered the 
problem of estimating a signal with Gaussian white noise and showed that the 
posterior rate automatically adapts to the unknown smoothness condition if 
the "smoothness parameter" only takes values in a discrete set. Huang [12] 
and Ghosal, Lember and van der Vaart [S] showed that appropriate mixture 
of priors based on spline expansions or wavelets yield optimal posterior rates 
for a finite or countable range of smoothness parameters for density estimation 
and nonparametric regression problems. Alternatively, [21] constructed a prior 
based on a randomly rescaled smooth Gaussian process, which automatically 
adapts for a continuous range of smoothness parameters. They treated the mul- 
tidimensional well. A technical challenge in proving adaptation of the 
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posterior distribution is to find an approximation of the true function within 
the model, whose accuracy increases appropriately with increasing smoothness 
level of the true density. An interesting approximation idea proposed by [20j in 
the context of beta mixtures prior turns out to be very helpful for constructing 
required approximation and subsequent adaptive posterior distributions. A 
similar idea for normal mixtures was proposed by [13j. An analogous approxi- 
mation in the mult i- dimension situation was constructed recently in [3]. They 
used a special type of Gaussian process to construct an adaptive procedure. 
However, their constructions apply only to compactly supported densities. The 
issue of unboundedness of the support was resolved in [13j for univariate Gaus- 
sian mixtures by imposing appropriate tail conditions on the true density. 

The adaptation results in [13] used a prior based on finite mixture of the 
normal kernel in a univariate setting. In practice, Dirichlet mixture priors are 
popularly used in the univariate density estimation problems ([S] and [S]), as 
well as in the multivariate situations ([IS]). Posterior consistency results in 
terms of the Li-distance were studied in [26] under a multivariate setting. An 
extension to multivariate mixed-scale density estimation was discussed in [2]. 

In this paper, we study the posterior convergence rates for Bayesian multi- 
variate density estimation. We extend the approximation result in [13] to the 
multi-dimension setting assuming local /3-Holder smoothness and exponential 
tail conditions. Using the stick-breaking representation ([15|), we approximate 
a Dirichlet process by a finite sum of mixtures while the error is controlled 
within a pre-determined level, which helps us construct appropriate sieves for 
the problem. Similar technique has been used in [17j to prove posterior consis- 
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tency for conditional density estimation. We calculate the entropy and prior 
concentration rate around the true density. The posterior rate is shown to be 
j^-/3/(2/3+rf)QQg j^^K^ where n is determined by the smoothness level, the dimen- 
sion of the sample space and the tail behavior of the true density. The rate 
coincides with the minimax rate up to a logarithmic factor. 

To the best of our knowledge, most frequentist approaches for adaptive 
estimation are focused on using wavelets under a regression model setting ([4] 
and [18j). The performance of adaptive multivariate kernel density estimation 
depends heavily on the choice of the bandwidth matrix and the smoothing 
kernel (|2T]). Our model considers kernel based Bayesian adaptive estimation 
procedure that achieves optimal rates using product kernel. 

The paper is organized as follows. In Section 2, some notations and as- 
sumptions on the true density are introduced. The main results on posterior 
convergence rates are presented in Section 3. Approximation results are given 
in Section 4. Section 5 gives the proof of the main rate theorem. A few auxiliary 
lemmas and their technical proofs are presented in the Appendix. 

2 Notations and assumptions 
2.1 Notations 

Throughout the paper, we consider estimating a density / on W'- based on 
n independent and identically distributed (i.i.d) samples Xi,...,X„ taking 
values in W^. Let X = {Xi, . . . , Xd) stand for a generic observation from 
density /. We define marginal density functions of / for Xi as fi{xi), i = 
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1, . . . ,d. Let N = {0, 1,2,.. .} and let be a A;-dimensional unit simplex. For 
keN'^,xe M'^, let fc. = fci + fcs + • • • + kd, k\ = hi ■ ■ ■ and x'' = x^^ ■ ■ ■ x^/. 
Similarly, for a real-valued function / on M*^, let f{x)'' = f{xiY^ ■ ■ ■ f{xdY'^- 
We define partial order for j and fe as j > fc if > /cj for i = 1, . . . , d. Let 
ll^llp — {SiLi kil^}^'^^ stand for the ^^-norm of a vector x e R'^; 1 < p < oo 
and ||a;||oo = niaxi<j<d \xj\. Moreover, for p = 2, we simply write ||a:;||2 as ||cc||. 
For 6 > 0, let Tfe stand for the largest integer strictly smaller than h. 

We use a — (cti, . . . , Ud)' e as the scale parameter and define a d x d di- 
agonal matrix S = diag(cr). Let = (27r)"^/^ exp(— a;^/2) be the standard 
normal density and 00- (a;) = a~^4>{x/a). The corresponding multivariate nor- 
mal density with independent components is denoted by (f)cr{x) — Y[i=i ^Paii^i)- 

We use < for inequality up to a constant multiple, where the underlying 
constant of proportionality is universal or not important for our purposes. We 
define a linear operator K^^. as 

f{xi, . . .,Xi-i,Xi - yi,Xi+i, . . . ,Xd)(f)ai{yi)dyi. (2.1) 

-oo 

Then a composition operator is defined as = K^.{K^^~^f). Note that 
these convolution operators commute with each other. We extend this notation 
to the multivariate case as K'^f — {K^^^ . . . K'^^)f. For simplicity, we define 

We use D{e,T,d) to denote the packing number, which is defined as the 
maximum cardinality of an e-dispersed subset of T with respect to distance d. 
Similarly, we write A^(e, T, d) for the covering number, the minimal cardinality 
of an e-net for T in terms of the distance d. We define log_,_(a;) = max(loga;, 0). 
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2.2 Assumptions on the true density 

Let /o stand for the true density. We assume the following conditions on /q. 

• (CI) Smoothness: The function log /o is assumed to be locally /9-Holder 

d log f (x) 

with derivatives lj{x) — — ■. -. We assume the existence of a 

dx^i ■ ■ ■ dx^/ 

polynomial L and a constant 7 > 0, such that for r = r^, 

Mx)-h{y)\<r\L{x)\\x-yf-' 

for all k. — r and x,y satisfying \\x — y\\ < 7. Moreover, there exists a 
constant ^0 > such that for all j. < r, 

J fo{x)\lj{x)\^^^+^''^/^-dx <oo, J foix)\L{x)\^+^°/^dx < 00. (2.2) 

• (C2) Marginal-joint relationship: There exist a constant Co and density 
functions gi,...,gd such that fo{xi, ...,Xd)>Co n^=i 9i{xi), such that 
/ foix){l/g{x)Y meix{l,\\x\\'^)dx < 00 for some ^ > 0, where g{x) — 

• (C3) Tail monotonicity: On a region D = [—a, b^, where a, 6 > 0, we 
have that inixeo g{x) = cq > 0, gi is nondecreasing on Xi < —a and 
nonincreasing on Xi > b ior i — 1, . . . , d. 

• (C4) Tail decay: The true density /o has exponential tails on D'^, i.e., 
there exist constants C > and ri, T2 > 0, which only depend on /o, such 
that 

/o(a;) < C'e-"ill^ll"', cc e (2.3) 



Remark 1 Conditions (C2) and (C4) imply J fo{^og_^{fo/ g)Y < oo for any 
p > 0. Conditions (CI), (C3) and (C4) imply / /o ( log.,. /o) < oo for any 
p> 0. 

A wide range of multivariate density functions satisfy Condition (C2), 
e.g., nonsingular multivariate normal distribution and their finite mixtures. 
To see this, consider k multivariate normal densities fj, j = l,...,k, with 
mean and covariance matrix Sj. For any convex combination of f/s f* = 
Yl^=i^jfjy there exists A > such that f*{x) > exp{— A||a:;||^/2}. Define 
density g* = (A/27r)'^/^ exp{— A||a;p/2}, then /* > g*. To see this, choose 
A to be the smallest eigenvalue of all Sj^s. Then for any < < 1, 
/ /*(l/^*)^max(l, \\xf) < oo. Hence Condition (C2) holds for /*. 

Condition (C2) also holds for product type densities fo{x) = Y['j=i fji^j) 
with g = /o, if / /q max(l, ||a[;||^)(ia3 < oo for some < ^ < 1. 

Remark 2 Condition (C2) is used to lower bound K^rfo as in Lemma [2l Con- 
dition (C3) generalizes the monotone tail condition in [13] to the multivariate 
case. 

3 Main results 

We construct a prior for / as follows: 

• pf,ct = J^d <Pa{x - ^l)dF{^l)] 

• F follows a Dirichlet process with base measure a. Denote a = 
a/a{M.'^). We assume that there exist constants ai,a2 > such that 
1 — a{[—x, xY) < exp{—aiz°'^} for sufficiently large a; > 0. 
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• cTj ~ G for i = 1, . . . , (i, where G is a fixed probability distribution 

satisfying G{x) < exp{— Cix^"^} as x — )• and 1 — G{x) < x'*^ as 
X — )■ oo, where Ci > and eta > 1 are fixed constants. This condition 
allows a wide class of distributions, e.g., an inverse gamma distribution 
for (T^ when as = 2 or an inverse gamma distribution for a when as = 1. 

We have the following result for posterior convergence rates: 

Theorem 1 Suppose that the true density /o satisfies Conditions (C1)—(C4)- 
Then the posterior rate of convergence with respect to Hellinger or Li-distance 
is given by e„ = n~'^/'^'^"'"^'')(logn)*, where t > (^-^ + d + l)^^^ ■ 

The assumption on the base measure d is analogous to (11) of [13j. Our 
tail conditions on the prior of cr is weaker than the one in pjj. Both sets of 
conditions are needed to control the prior probability of the model. 

For simplicity, we let ai = ■ ■ ■ = in the discussion. However, our results 
also hold for independently, not identically distributed ai,...,ad as long as 
maxj (Tj < C2{minj cxj}*"^ for some constants 6*2,(73 > 0. 

Our result also applies for finite-mixture priors. We consider the prior for 
/ as follows: 

• m{x; k, fx, uj, cr) = Y.'}=i ^j^^crix - /x^); 

• There exists constants Ci > C2 > and Cs > such that 

exp{-ciA:(logfc)^^} < Ii{k) < exp{-C2A;(log fc)^^}. 
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• Given k, fix, . . . , /x^ are i.i.d realizations from a distribution, which sat- 
isfies n(^t ^ [— Zj^;]'') < exp{— 04^^^^} for sufficiently large z > and 
constants 04,05 > 0. 

• Given k, the prior on weights uj = (cji, . . . , Uk)' satisfies 

n(||u; - a;o||i < e) > exp{-C6A;(log log. e} 
for any c^o G and constants 04, Cq > and < e < 1/k. 

• Bandwidth ai, . . . ,ad (i.i.d) follow inverse gamma distributions. 

Then we have the following rate theorem, which is a generalization of Theorem 
2 of [13]. 

Theorem 2 Suppose that the true density /o satisfies Conditions (C1)—(C4)- 
Then the posterior rate of convergence with respect to Hellinger or Li- distance 
is given e„ = n~'^/('^+^^)(logn)*, wheret> ^^^j;^ (;^+c?+max{c3, 1+04, ^}) + 
max{0,(l-C3)/2}. 

4 Approximation results 

The following proposition helps prove the main theorem on posterior conver- 
gence rates. It is also of interest on its own as it bounds the Kullback-Leibler 
(KL) divergence between /o and its approximation. The proof is given in Ap- 
pendix. 
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Proposition 1 Let /q be the true density satisfying Conditions (C1)—(C4)- 
Then there exists a density hp such that for all sufficiently small a, 

/^»(^Hlog5^)^<^x^O(.-). (4.2) 

In order to prove approximation result, we use the expansion technique in 
[13] and its multivariate modification described by [3]. 

Let r and /3 be defined as in Condition (CI). For k G N'^, we define 
moments rrik = J Vi^ ■ ■ ■ y^'^4'{y)dy- Then we recursively define two collections 
of numbers Cn and dn as follows: 

For n G N"^, if n. = 1, then Cn = dn = 0. For n. > 2, define 

(-l)^'-+i , , 

Cn= } n rukdi, dn= j- + Cn. (4.3) 

fc! n! 

n=l+k,l.>l,k.>l 

Since the Gaussian kernel is symmetric about 0, all odd moments are 0. Hence 
Cn can be simplified as c„ = - J2n=i+2k,i.>i,k.>i^'ikdi/2k\ 

Define fp = f - Ei=i Efc.=j 4cr'"(^i/), where = ^^r^^- Lemma 
3.4 in shows that the supremum distance between /o and fp is 0(cr'^). 

However, this type of construction does not guarantee that fp is a density 
function because it may take negative values. To overcome the problem, we 
define a truncated version of fp and then standardize it to obtain a density 
function: 

h}{x) = fpHfp > i/o(x)} + IfoixWp < ^foix)} 



hp{x) = h*p{x)/ J hp{u)du 



(4.4) 
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Remark 3 From fl4.4l) . we get hp < h*^ < + /q. Using the same arguments 
in [13], we can show //3 < /q. Then combining these two facts, we conclude 
that hp is upper bounded by a multiple of /q. 

Remark 4 From the definition, fp can be expressed as a linear combination 
of iri/o's: 

f(s = Cpfo-J2cjKifo, (4.5) 

3>0 

where Cp and Cj are constants determined by /q and /3. The coefficients Cj 
satisfy >q Cj = Cp — 1. Hence K^-fp is also a linear combination of K^fo's. 

The approximation mixture in Proposition [1] can be discretized without 
changing the order of the approximation error. The following lemma is a 
multivariate generalization of Lemma 4 in [13]. This will be used to lower 
bound the prior probability on the KL-ball around /q. Its proof is given in 
Appendix. 

Lemma 1 Let /o be a density satisfying Conditions (CI)— (C4)- Then there 
exists a finitely supported probability measure F with at most 
Cia~'^\\og(T\^/'^^'^^ support points from the set {x : fo{x) > ccr^^^'^/^}, where 
Ci > is a constant such that 

//ologA = o(a^/^), I fo{\og^Y = 0{a'^). (4.6) 

5 Proof of Theorems 
5.1 Some useful results 

We first state a few results that are helpful for proving Theorem [H 
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Since a Dirichlet process F ~ can be represented by a Sethuraman's 
stick-breaking process as Xli^i Vi6g., where Vi = YYj=ii^ ~ Yj)Yh6i ~ G,Yi ~ 
Be(l, M), z = 1, 2, . . ., G is the cumulative distribution function of a and M = 
a(R). We truncate the stick-breaking procedure after a certain level such 
that the error is within a predetermined level. Define the number of terms 
needed in the finite mixture as A'^^ = inf{m > 1 : YllLi^i > ^ ~ Let 

= Y.i=i ^i^e, + K^eo, where K = Y[^=iO- - Yi) and 6*0 ~ G independently of 
everything else. By Lemma 3 of [TB], it follows that 

dTv(i^,i^.) <e, (5.1) 

AT, -2 ~ Poi(Mlog_e), (5.2) 

where dxv stands for the total variation distance. It is easy to see from (15. ip 

that \\pf,<7 -PF,,Ai < e- 

The following lemma lower bounds Ka-fo- 

Lemma 2 Assume /o satisfy Conditions (C2) and (C3). Then given a suffi- 
ciently small, Ka-fo > C^g for some constant and density function g defined 
m (C2). 

We need the following inequalities to help lower bound the prior probability 
in the KL-ball around /q. 

Lemma 3 Let W'- = IJjLo partition of M'^ and F' = J2f=iPj^zj be a 

probability measure with Zj G Uj and \\zj — z^Wi > 2e for j,k = 1,...,N, 
j ^ k and e > 0. Define V{zj, e) = [zj^i — e, Zj^i + e] x ■ ■ ■ x [zj^d — e, zj^^ + e] 
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and = minf^-^Xi for x E M"^. Then for any probability measure F on M.'^, 
cr,cr' G M^j., we have that 

N 



\\PF,a-PF',a'\\i< max ^ + — -—— + J2\F{V{zj,e))-pj\, (5.3) 

»=i,...,d (XiAai ((7(1) Aa^^))'* 

and 

e , inti^i-nti^^i 



bF,<T -PF',,t'||oo < 



N 



The following discretization result gives multidimensional extensions of 
Lemmas 3.1 and 3.3 of [lOj. Their proofs are given in Appendix. 

Lemma 4 (1) LetO < e < 1/2 be given. Fixa-Q, cr'^ G [o:q,(To]'^ satisfying Ictq — 
ctq'^I < a^e, then for any probability measure F on a region D' = [— ai, ai] x ■ x 
[— a^, aj, where maxj < L(log_ e)^", 7o > 1/2 and L > are constants, there 
exists a discrete probability measure F' on D with at most N < ffg ^'^(log_ eY"'^^ 
support points such that \\pF,a-o — PF',a-'J\oo ^ ^o'q/q^^- 

(2) // o" — > 0, then for any probability measure F on [—a^,aeY with = 
L(log_ e)'''^, where 7i>0, 0<e<l/2 and L > are constants, there exists a 
discrete probability measure F' on [— a^, a^Y with at most N < o"~°'(log„ e)'^^"'"'"'^ 
support points such that 

\\Pf,ct - PF',a'\\oo < cr^'^f^, (5.5) 

and 

Wpf,. - PF',Ai < ^'(^(log- e)V2 V (log_ ey^Ye. (5.6) 
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5.2 Proof of Theorem [T] (Part I) 

We apply Theorem 5 of [llj for g„ = 72-/^/(2/3+'=') ^y^ ^ ^ ^-p/(2p+d) 

for t2> ti. We construct appropriate sieves J^n,j and verify the following three 

conditions: 

E,°lo ViV(e„,J-„,„d)v/n„(J-„,,)e-"^"" ^ (5.7) 

n„(/C(/o,e„)) >e— " (5.8) 

"^niK) < e-'"^", (5.9) 

where /C(/o,e) = {/ : //o(log/o//) < e, jh{\ogh/ff < e } is the KL ball 
around /o of size e. Choose o"„_i = . . . = cr„ = e^^^. Define 

a„ = n-^, a, = expjne^ (logn)^}, = [n'^'^^^+'^\\ognf-\ + 1 (5.10) 

and hn > ^'^/"^((i+z/?) f^j, A > 1, a2,tr,6 > 0. First we consider the collection of 
finite mixtures: 

k 

= I ^Wi0o-(a3 - fJ'i) ■ k < r„, e fen]*^, cr G S'„, j = 1, . . . , fcj 

i=l 

as in ^iij, where = [o:„,o-n]'^- 
Define the sieve 

J^n = {pf,(t '■ there exists Pf',<t G J-"* such that dTv(-F, F') < e„}. (5.11) 

Notice that J^* C J^n- 

We first verify equation f l5.9p . From the construction of priors of CTj as in 
Section 3, 

nn(cri G (a„,a„)'') < exp{-Cin'^''^} + exp{a3ne^(logn)''} 
< exp{-C6ne^} 
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for some constant Cg > when n is sufficiently large. 

Given the number of mixtures N^^ fixed, from the assumption, we have 

I^n{^l ^ [-bn^bnflN,^ = k) < m„(Mi ^ fcj'^) < fcc"'^^^"' . (5.12) 

Therefore, 

oo 

k=l 

< e-"i''"'logn. (5.13) 

Using (15. 2p and tail estimates of Poisson distribution P{X > r) < exp{— r log r} 
if X ~ Poi(A) and r > Ae, we have the following results for X = N^^ and r = r„ 

n„(iV,„ > r„) < exp{-r„logr4 < exp{-n^/('^+2^)(logn)*'-+i}. (5.14) 

All three bounds together give 

iiniK) < n„( j-r) < n„(50 + n„(iv, > r„) + n„(/x ^ [-6„, foj'^) 

< exp{-C7n''/(^+2/3)(iQg^)t.+i| ^5^15) 

for some constant C7 > 0, which decreases faster than e"^"*^" if t,. + 1 > 2ti. 
5.3 Proof of Theorem [1] (Part II) 

In order to verify (15. 7p . we split (a„, a"„) into Jn + 1 disjoint subsets 

(a„, a„) = U (a„(l + ~eny-\a^{l + inY) U (a„(l + ?„)^", a„) (5.16) 
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for Jn = [(log o"„/ct„)/ log(l + e„)J. Hence we obtain a partition of Sn with 
(J„ + 1)'^ subsets. Denote Snj = <S)i=ih.ni^ + en)-''"\ + inY' V aj, where 

ji 1) • • • ; <-^n ~l~ 1- 



Then define 

k 



i=\ i=\ 

Tn,j = {pf,(t '■ there exist PF',a- £ J^nj such that dxvl-^, -^0 ^ (^n}- 
We can bound the prior probabihty on J^nj by 

T^ni^nj) < T^niSnJ) < (1 + en)^'-V;^et. (5.17) 

In order to calculate the entropy, we further decompose J-"* ^- into 

Ki = U ^n,3,k = U {^^^'P<r{x-lJ^i) ■ H ^ ["&„, a. G } . (5.18) 

fc=l fc=l j=l 

Using the following general results on bracketing numbers taken from [TU] 



D{e,Ak,\\ ■ 111) < {tY (5-19) 



and [13], 

DTp. a,. II ■ Ih^l < ( 

D(.,^KdU-\U)<'^^^^0±^. (5,20) 

we obtain the following estimates of packing numbers 

/ 5 \ 

D(e„,A,„,||-||i)< - , (5.21) 



D{tn, [-&n,M'"^ II ■ 111) < {rnd)\{2tn) '"''(26„ + 2e„)'"', (5.22) 

d 

D{en, S.^j, II • 111) < rf!(2e„)-^ H^^^^^ + ^~")'' " ^"(^ + ^~")'"' + 2^")' (5-23) 

j=i 

^fe,-^;,-,fc,||-||i) (5.24) 

<D(e„,Afc,|| ■ ||i)Z}(e„,[-6„,6„]'-"M| ■ ||i)D(e„, || ■ ||i). 
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Combining fl5.2ip , fl5.22p , f l5.23p , and using the relationship between cover- 
ing and packing numbers, we have 

N{3en, T n,j, II • 111) 

< D(3e„, J-" nj, II • 111) 

< D(e„,J-;^.,||-||i) 

< r„D(e„, J'*^^-^^,^, II ■ 111) 

d 

i=l 

< r„(r„rf)!(e„)i-'-"-'^"'^6;''^Q"(n-^(l + e^y-' + 2)" (5.25) 

for some constant Cg > 0. Combining (15.170 and (15.250 and applying Stirling's 
formula on (r„(i)!, we find that ^jN{en-i ^n,ji d)^JI\.n{^n,j) is bounded by a 
multiple of 

x(r„)^^"/2+3/4^-^)(i+d)(i-.„)/2^.„d/2^.„/2^^-A(^ + e„)^'-^ + 2)"'^ 

x(e„)(i+'^)(i-^'")/26;.'^/2Cg-"/2(^-A^^ + InY-^ V 2)'/' 

< exp{C9r„(logn)}(l + l^y-/\n~^{l + 1^)'-' V 2)"^ (5.26) 

for some constant Cg > 0. Observe that n~'^(l + InY'^^ < 2 implies (1 + 
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tnY'/'^ ^ n^/^. Therefore from equation (15.261) . we have the following: 

d Jn 

< ^^exp{Cior„(logn)}n^/22^/2 

+ 5^ 5^ Cg exp{Cnr„(log + e„)(i+'^)^-/2 

< exp{Ci2r„(logn)}(n^/V„^ + ^-^'^/^(l + Ir.f^^"^''"'^ J^) (5.27) 

for some new constants Cio, Cn, C12. Since J„ is defined that n~^(l + e„)'^" = 
exp{ne^(logn)'^}, the r.h.s of (15.271) is bounded by a multiple of 

exp{Ci3r„(logn) + ne^(logn)^(l + d)d/2}. (5.28) 

In order to let (I5.28P increase slower than exp{ne^}, we need 2^2 > max(tr + 
l,2ti + 5). 

Finally, we verify (15. 8p using similar arguments as in [TT]. For sufficiently 
large 6 > 0. 

i=lj=l j=l 

c{{F,a):P,{log^f<al^^^,k = l,2}, 

PF,cr ^ ' 

where ^ cr^'^| log obtained using Lemma [H Applying Lemma 10 

of [TT] with = and e = e^, the prior probability is lower bounded by a 
multiple of 

exp{-Ci4Ar„ log_ e„} > expj-Cun'^/^^^+'^Hlog^)'^^"'^'^^^"*'"'^'^}, (5-29) 
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which decreases more slowly than e""""" if^ + d+1 — ^ < 2ti. Combining 
with ^2 > ti, tr + 1 > 2ti and 2t2 > max(tf. + 1, 2ti + 6), we obtain t2 > 
f + + + l) where 6 is an arbitrary positive number and hence can 
be absorbed in the remaining terms. □ 

5.4 Proof of Theorem [2] 

The proof uses a multivariate modification to the proof in [13]. We consider 

g„ = n-^/^''+^''\\ogny\en = n-^/^''+^^\\ogny\an = il/^ for > ti > 0. 
Also, the number of finite mixture terms is fc„ = 0(n'^/'^'^~^^'^^(log ra)'^/'^^"'"*^"'/^). 

Then in order to satisfy (15.81) . we have ti{2 + d/P) > d/T2 + d + max{a4 + 
l,C3,C5/r2}. Conditions (15.71) and (15.91) together give ^2 > max{ti,ti — C3/2 + 
1/2}. Combining these two constraints, we obtain t > jj^(^-^ + d+max{c3, 1 + 
a4,f})+max{0,(l-C3)/2}. □ 

6 Appendix 

The following three lemmas are helpful in controlling the KL divergence be- 
tween /o and K^hjj. 

Lemma 5 Given /3 > 0, let /o satisfy Condition ( CI ). Then for all sufficiently 
small a and all x contained in the set 

A„ = {xeM.'': \l,{x)\<BcT-j\\ogcT\-^/\ j. = 1, 2, . . . , r, 

\L{x)\ < fia-'^ I log a I -'3/2^-/3/2}, 

we have 

K^ff^ix) = foix) (1 + Oia^)Rix)) + O(a^) (l + Rix)) , (6.1) 
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where R{x) = Sr+i\L{x)\+J2l=iSj\lj{x)\'^/^-, H is a positive number that can 
be chosen arbitrarily large, and Sr+i and sj are nonnegative constants. 

Proof We follow the approach in Appendix (C) of [I3]. By Condition (CI), 

log/o(2;) < log/o(^) + - + - ^ll''- (6.2) 

\ogUy) > iog/o(a.) + ^-^(y - - L(^)\\y - ^f- (6-3) 

for all X and y with \\y — x\\ < 7. 

Define Bf^^r{x,y) = ^ ^ {y — xy + L{x)\\y — x\f. First we assume 
i=i 

P G (1,2] and r = 1. We want to demonstrate below that 

K.foix) < {l + 0{{\L{x)\ + Y\-^,^ogfo{x)\^y^)}Mx) 



i=l 

d 



+ (1 + \L{x)\ + Y l^log/o(^)lOoK). (6.4) 



i=l 

To prove fl6.4p . we define for any a; G M'^, 



D^ = {y: \y^ -x,\< k'a\ \oga\^l\ z = 1, . . . , 4, (6.5) 

where k' is a sufficiently large constant to be chosen below. 

Assume that k'a\ logcr|^/^ < 7 for cr as in Condition (CI). Then (16. 2p can be 

written as 

K^foix) < fo{x) [ e^f^-^^^y^cP,{y-x)dy+ f f,{y)<P,{y - x)dy. (6.6) 
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Furthermore, ii x & A„ and y G D^, we consider the Taylor expansion of 
exp{Bff^^r{x,y)} to the r-th degree. Then for a sufficiently large M, 



m=0 



< 



i:^{i:^-^(y--y+Lix)\\y-xr 

m=0 j.=l 



L(x 



+M\J2^iy - + L{^)\\y - ^\ 



J! 



r+1 



Since r = 1, (16.71) turns into 



9Xi 



+ M 



a 



^log/o(a:) 

dxi 



(6.7) 



(y,-x,) + L(a;)||?/-a;f (6.8) 



When integrating over D^., the terms with a factor ?/j — disappear. So 
the first term on the r.h.s of (16. 6p is bounded by 



/. d 

foix) / 0^(t/-a^)|l + MV 



+L{x)\\y - xf + Mk'^BL{x)\\y - xf^y, (6.9) 



where the following two inequalities are used for x ^ A^r and y G Dx'- 



d\ogfo{x) 



dxi 



< 5(7-^1 log (T|-^/2fcV| logcr|^/2 = k'B, 



\L{x)\\\y - xf < (rfFVll log a 1)^/2^(7^/3 1 ioga|-/^/2d- 



-/3/2 



(6.10) 



(6.11) 



l,...,d, 



Now J^,(f)^{y - x)\yi - Xil^dy = 0{a") for any a > 0, i 
when k' in the definition of is sufficiently large. By choosing constants 
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ki = M{k'Bf-f^ and ^2 = 1 + Mk'^B, we obtain 

< fo{x) J ^(t)a{y-x)^^l + kiY^^^^^^{yi-Xi)'^ 
+k2\L{x)\\\y - xf^y 

+ (1 + ll/olloo + l^^^f + ^21^(^)1)0(0, (6.12) 

i=l 

which gives f l6.4p for /3 G (1,2]. Using similar arguments on the other direction 
from (16. 3p . we obtain Lemma 1 for /3 G (1, 2]. 

Now consider the case when r > 1, /3 G (r, r + 1]. We have the following 
result by doing similar calculation as in fl6.2p . f l6.3p . (16. 6p . (16. 7p and (16. 9p : 

Lr/2J 

K^foix) = /o (l + 5^ m,(a;)a^^ + i?(a;)0(a^)) + (1 + i?(a;))0(a^). (6.13) 

This follows by controlling the integral of terms containing a factor niLi(?/i ~ 
Xi)'^' over Dx. Since the normal kernel is symmetric over Dj., we only need 
to consider the case when fcj's are even numbers. When k. > r, there exists 
a fc* G N'^ satisfying k*. = r and k* < ki for i = l,...,d. Since one of 
the inequalities is strict, we can choose k* such that k* < ki and then define 
Q{x,y,f3) = minfc*(y — x)''* {yi — Xi)^^'^'^"^ . The integral is bounded by a 
multiple of cr^ when fc. > r by taking a factor Q{x,y,(3) out and bounding 
the remaining term by a certain power of |Zj|'s and \L\, which are denoted by 
mi{x) in (I6.13p . If k. < r, then they can be bounded by a multiple of cr^", 
where u < \r/2\. 
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We can substitute /o in f l6.13p by //3 because / (f)a.{x)\\x\\''dx < oo for all 
k G M'^, ki < oo, i = 1, . . . ,d, and the fact that J\\x\\>k'\ioga\'^/'^ (j)(T{x)\x\'^dx = 
0{a^) for all k and H taking arbitrary large values, provided that k' is suffi- 
ciently large. Hence the proof is complete. □ 

Lemma 6 Define Ea- = {x : g{x) > a^^}. Assume that /q satisfies Condi- 
tions (C1)—(C4)- Then for all sufficiently small a, all i E N'^ and e > 0.- 

/ Kfoix)dx = 0{a'^+^), [ KiU{x)dx = 0{a^^^^) (6.14) 

provided that Hi is sufficiently large. 

Proof Observe i = X]fc=i.7fc' Jk ^ N'^, where each component of jk only takes 
two values and 1. If some components of jk are 0, then we can remove these 
Os away and consider a corresponding convolution operator in a low-dimension 
case. Therefore it is good enough to prove (16.141) when ii = . . . = i^ = m 
for m G N. The proof for other cases can proceed in a similar way. In order 
to bound the first integral in (I6.14p . we consider sets Aa-^s = {x : |/j(£c)| < 
6 BcT-^\ log cT\-j/\ j. = l,...,r, |L(a;)| < 6Ba'-'^\\oga\''^/^d~'^/^} indexed by 
6 < 1. Using Markov's inequality and Condition (C3), 

< V{\L{x)\ > (55)(2/3+2^)//'a-2^-2^|loga|-(2/^+2e)/2| 

+ ^P{|/,-(a;)|(2/5+2e)/i. > ^^^^(2/3+2.)/i.^-(2/3+2.)|i^g^|-(/3+.)| 

= 0(a'^+^), (6.15) 
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provided that a~'^ \ logcrj"'^"^ > 1 and e > 0, which is the case if a is sufficiently 
small. This completes the proof for m = 0. 

If m = 1, consider independent random vectors X and U with densities 
/o and standard normal respectively. Then X + SC7 has density -fCo-Zo- We 
want to prove X e A^^^s together with ||C7|| < k'\ logcrp/^ are in contradiction 
with X + SC/ G A% when 5 is sufficiently small. 

We observe that X + EC/ e imphes 

\L{X + St/)| > Ba-^\ log(7|-^/2^-^/2 i^.^j^ ^ j.^^)! < ioga|-^-/2 

for some i satisfying i. <r. 

Prom Condition (CI), if 5 is sufficiently small, then for alH. = 1, . . . , r. 



i>i j.=rj>i 

< Ba-^\\ogcr\-^/'^. (6.16) 

Therefore it has to be a large value of \L[X+HU) \ that forces X+HU to be 
in A^. Hence it suffices to show \L{X)\ < 5Ba-^\\oga\-^l'^d-P/'^ and \\U\\ < 
k'\ log(7|^/2 are in contradiction with \L{X + T.U) \ > Ba'^] \oga\-l^/'^d-^/^. 

From Condition (CI), wc assume L is a polynomial of degree q and has 
roots Zi, . . . Zq. Let rj = (maxj \zii\, . . . , maxj \zid\)- If \Xi\ < rji + 1 for i — 
l,...,d, then each component of ||X + EC7|| is bounded by corresponding 
component of 77 + 2 when a is sufficiently small. As a result, \L{X + S[7)| < 
Ecr"-^! log cr I Alternatively, if there exists a 1 < i* < 0? such that 
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l-^i* \ > Vi* ~^ 1) then we consider the Taylor expansion of L{X + "EU): 

j.=l •'■ ^■ 

g 

< SB{da^\\oga\y^^^ + Y,0{a^-^\\oga\^^-'^^/^), (6.17) 

i=i 

which is less than Ba~^\\oga\^^^'^d~^^'^ when a < 1 and 5 < 1 are small 
enough. 

Because P(||C/|| > k'\\og(r\^/^) = 0{a^'^+') for e > if k' is sufficiently 
large, we have 

P(X + SC/ G A^) 

< P(X + EC/ G A^, ||C/|| < A;'| loga|^/2) + P(||C/|| > A;'| logal^/') 
= P(X + SC/G A^,Xg ||C/|| < A;'| log a| 1/2) + 0(^2^+^) 

+P(X + EC/ G A^, X G ||C7|| < logal^/') 

< + 0(a2'3+^) + P(X G A^,^) 

< 0((t'''+^). (6.18) 

This completes the proof of first equation in (16.141) for m = 1. For m > 1, we 
can redefine the density of X as K^~^fo and apply the same arguments above 
with a decreasing sequence of 5's. 

Now we bound the second integral in (I6.14p . If m = 0, using Condition 
(C2), we have 

/ fo{x)dx= [ f,(x)-^{gix)Ydx<a^''^=0{a'^^^) (6.19) 
Je- Je- {g{x)) 

when Hi > {2(3 + e)/^. 
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Consider m = 1, we define sets E^-^s = {x : fo{x) > a^^^} indexed by 
6 < 1, random vectors X having density /o and U following standard normal 
distribution. Observe X + G fl Aa- contradicts with X G E^^s H A^r: 
on one hand, X + EU e E^^ and X e E^^s imply + ^U) - 1{X)\ > 
(1 — 6)Hiloga. On the other hand, X,X + HU G A^- implies that \1{X + 
SC/) - 1{X)\ < 5(T-'^| loga|-i/V^A;'| logaj^/^ = 0(1). 

Similarly with the previous treatment, for a sufficiently large constant k' 
and Hi > (4/3 + 2e) /6, we have 



< 0((T2^+^) + P(x + st/GE^nA^) 

< 0{a^^ + e) + P(X + St/ G n A^, X G E^^^ H A^, ||C/|| < k'\ \oga\^/^) 
+P(X + SC/ G n A,, X G n A„, \\U\\ < k'\ loga\'/^) 

< 0{a^^+') + P{X e El^,) 



This completes the proof for m = 1. The above procedure can be done repeat- 
edly in the same way when Hi is chosen sufficiently large for m > 1. Hence we 
obtain dSH]). □ 

Lemma 7 Assume that /o satisfies Conditions (C1)—(C4)- If P > 2, x E 
Acr n Ef,. and a is sufficiently small, then 




(6.20) 



K^hpix) = foix) (1 + Oia^)Rix)) + O(or^) (l + i?(a; 



)) 



(6.21) 
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where R{x) is defined in Lemma\^ 

Proof For x G A„ (1 E„, apply similar arguments on K^. as in Lemma to 
obtain 

K^.fix) = (1 + 0{a'^-)R^'^\x))fo{x) + 0((t^)(1 + R'^ix)) (6.22) 
Using Remark m for constants Uf. 

j>o 

Cf)-l 

= Cpfo - (l + OKOi?(""na^))/o(a:) + 0(0(1 + 

i=l 

> fo/2 (6.23) 

when a is chosen to be sufficiently small. Therefore A^- fl E^- C J = {x : 
fpix) > i/o(a;)}. Now since x e (1 

K^h{x) = K^Mx) = fo{x) (1 + 0{a^)R{x)) + O(or^) (l + R{x)) 

Therefore 

K^h^ix) = K^h{x)/{l + Oia'^)) 

= fo{x) (1 + 0{a^)R{x)) + O(a^) (1 + R{x)) 

where H can be chosen to be arbitrarily large. □ 

Remark 5 The density function hf^{x) is lower bounded by a multiple of g{x) 
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because 



Jh{x)dx = 1 + j^^{^fo-U)dx 

= 1 + / (^/o - C^fo + J2 CjK.J,)dx 

= 1 + 0(^2'^) (6.24) 

Therefore Kcrh/3 is also lower bounded by a multiple of g{x). 

Proof of Proposition [T] Using inequality logx < x — 1 for all a; > 0, we 

have 

fp<o^-< [pE^^ r^±^+ [(p-,). (6.26) 

Js q Js Q Js 1 Js 

for any densities p and q, and any set S. We apply this result for p = foix), 
q = K„hp{x) and S = A„r\ 

/o a; log rfa; < / /o a; log t/a; 

K„hp{x) Ja%vje-„ K„h^{x) 

{K„hfi{x) - fo{x))dx 

iUx) - K^hp{x))\ ^ 
dx. 6.26 

Using Remark [5l K(rhp{x) is lower bounded by a multiple of g{x). Using 
Holder's inequality and Remark [H the first term of fl6.26p is bounded by 



dx < 


/ 






+ 


/ 






+ 


j 







[ /o( 



a;) I log , \dx 



< C,{ / /„(log,/„)% /c,/„(l)d.}*"{ l^^ M.) 



!/<? 
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for constants Ci, C2 > 0, g = (2/3 + e)/2/3, p = (2/3 + e)/e and ^ as defined in 
Condition (C2). By choosing Hi such that equation fl6.14p in Lemma [6] holds 
for i = 0, the first integral of the r.h.s of fl6.26p is 0(cr^^). 

Since /i^ is a linear combination of i^^/o's, so is K^^h/jlx). Therefore by 
another application of fl6.14p . we obtain the second integral is a finite sum of 
0{a^^+'), which is still 0{(x^^+'). 

For the last integral of the r.h.s of (16.26^ . we apply Lemma [7l Observe that 
when X & A„ n Ea-, Ka-{x) is bounded by a multiple of /o given H > Hi. 

( [ fo{x)R'{x)dx)o{a'^) 
+ ( / {\^R{x)f)lf{x)dx\0{a'''') 
+ 2( [ R{x){l + R{x))dx)0{a^+^) (6.27) 

Condition (CI) implies fo{x)R''{x)dx = 0(1) for k = 1,2. By choosing 

H satisfying H > Hi + P and using fo{x) > a^^ on E„, these three integrals 
in (Km are ©(a^^), hence (O) follows. 

The integral in (14. 2 p can be treated in a similar way: 

//„(log^)= < / /„M(log-^)^dx 

J K„hp' Ja-„ue- K„hp{x)' 

^ r iMx) - K^h,{x)y 
U^nE^ Kahp{x) 
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JA„nE„ (K„hfi(x)) 



'A^nE^ [K„hji{x)y 
where the first two terms on r.h.s are shown to be 0((j^'^), and the last integral 
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can be bounded by a multiple of 

/ fo{x)R'{x)0{a^^)dx + 3 [ R\x)0 {a^^+'') 

+3 / R%x)0{a('+''')/foix)+ [ R\x)0{a^'')/fl{x)dx 
= 0{a^^) (6.29) 

by choosing H > Hi + /S. □ 

Proof of Lemma [1] Define set E'^ = {x : hp{x) > a^^} with H2 > Hi and 
his{x) = h/sllE'^ix)/ J^, hi3{x)dx. Remark [S] imphes E'^ D E,^. Using Lemma 
Eland Remark m we have J^, hp{x)dx = 1 — j^,^ hp{x)dx = 1 + 0{a'^^). 



//olog-^ = //olog^+/ /o(log^ + log 
J PF,a- J K^hp Je„ V K^hp Pf,c 

+ I /olog^ (6.30) 
From Theorem [H the first term is 0(0"^''). Now observe 



a-il'13 



. (i+o(.^^))(i+ {-^^';--^;;;f;f ) 

^ 0^(33 - ?/)/i/3(l/)rf2// 

For X e E„ and y G E'J", because (f)„{x - y)hp{y)dy < a^^ < cr^^'^^g{x) 
and J^, (t>a-{x — y)hfj{y)dy > CqQ^x) for constant Cq > 0, f l6.3ip is upper 
bounded by (l + 0(cr^^)) (l + Cq^V^^-Hi^ ^ ^ _^ 0{a^^) when we choose 
H2 > Hi + 2/3. On the other hand, ( lOTjl is lower bounded by 1 + ©(a^^). 
Hence ^ = 1 + 0(a^^) and therefore L /o log ^ = ©(a^/^). 
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Now we bound /q log . Apply Lemma H] for e = 6"'"^''°^°'' for some 
constant Ci and 71 = 1/t2, let pi^^o- be the finitely supported mixture approx- 
imating such that \\Ka-hi3 — Pf,ct\\ 00 < cr^'^e^'"^''"^'^' and F has at most 
Ka~'^\ logo"!'^/^^"'"'^ many support points, which are all contained in E'^ because 
of Condition (C3). Notice that these support points are also contained in 
{x : fo{x) > ccr^'^} for sufficiently small c > by Remark [3l Applying 

log ^ < '^(^^"'^(^^l < (6.32) 

q{x) mm{p{x),q{x)} {mmyp{y)-\\p-q\\oo) 

if miiiypd/) — Hp — g II 00 > for p = Ka-hj^ and q = Pf,(t, we have 

Je^ PF,a Je„ coa^'^ - \\K„h^ -pF,Aoc 

When a is small enough and Ci is large enough, the above estimate is 0((j^^). 

Finally, we bound the last term in fl6.30p . Using Lemma |3l we can add a 
mixture component with mean and weight a^^ without influencing approxi- 
mation results. Combine this result with the fact that K^rhp is upper bounded 
by a constant C2, we have 

/ Ux)\og^^^^dx < a"^^ ! fo{^)^,^og-^dx 
Je% Pf,o- Je-^ 9^[x) a^P<pcj[x) 

< ^^^^ / foix)^Jxra~''dx 

Je- g'^ix) 

= 0(^2^) (6.34) 

when Hi is chosen to be large enough. Hence the proof of the first equation in 
f l4.6p is complete. The proof for the second equation proceeds in the same way 
as in Appendix E of [13] . 
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Proof of Lemma [2] Choose (Tq = 2a/$ ^(5/6) such that A^(0, ctq) gives prob- 
abihty 1/3 to (0, 2a). Let cr = (ctq, . . . , ao)'. Then if x e D 



K.fo{x) > [ foie)Mx-d)de 

J D 

I <p„{x - 0)dO 
Jd 



> Co 



L (Tn an 



Co 

i=l 



> I- (6-35) 

U X ^ D, then at least one of Xj's are not in [—a, a]. We only consider the case 
Xi > a, X2 < —a and X3,...,Xd G [—a, a]. The calculation can be done for 
other cases in a similar way. 

fXi pa f „ d 



KM^) > c r r [ ■■■ [l[gMMx-0)dO 



Co 



> c/ giixi)(j)aoixi - 6i)d6i I g2ix2)(f)aoix2 - 02)d62 



X2 

2a. 1\ n _-2a. 



> ^*'(-'*(-'(*0-i)(i-* 

> Cogiixi)g2ix2) 

> Cig{x) (6.36) 
for some positive constants Cq and Ci. □ 

Proof of Lemma [3] By an easy multidimensional extension of Lemma 5 in 
[TT] . we have 

N 

\\pF,.-PF'Ai < . ' , w + E \mi^j,^))-Pj\ (6.37) 
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Similarly, by a multidimensional extension of Lemma 3 in [13] 



d 



\\pF',a- PF', CT'Wl < \\<Pa " <Pa'\\l < V " < maX^^— ^ (6.39) 



i=l 



IIP... - P....IU ^ I n ^ - n ^1 s (.,.,A;„)^^ in^- - n-:i 

j = l * 4 = 1 * V (1)7 j^]^ j^]^ 

Using triangle inequality on f l6.37p and f l6.39p gives (15 .3^ . Similarly, combining 
flOH]) and flCT]) gives □ 

Proof of Lemma [4] The proof of part 1 proceeds in the similar way with 
Lemma 3.1 in [10]. Subscript in ao is used to denote that ctq and cTq are fixed 
here. For simplicity, we drop them in the proof. 
We first observe the following: 



< ^^=^e (6.41) 



So \\pF,a- - PFyWoo < ^e- Define M = max(2ai, . . . , 2ad, cr^8 log_ e) 



sup - PF'A^) I < 2 n 0.(M - a,) < !l° < £-<^e. (6.42) 

k>|>M (V27ro:) 



Applying Taylor's expansion on 0o-(a;) and using the fact k\ > k^e we get 

A=l 2af) / 1 



0=0 



(6.43) 
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Therefore, for any F G OJt(D), 



sup \PF,a{^) - PF'A^ 

\xi\<M 

fe-1 d 



< sup I / E(2")''^'n""M^[E ^% 2'^^ ydiF-F')iz) 



+2 sup 

\zi\<ai 



j=0 i: 



(6.44) 



We want to choose F' such that J z^' ■ ■ ■ z^/dF = J z^' ■ ■ ■ z^/dF' for all 
Oi, . . . , ttrf satisfying 1 < XliLi fli < 2A; — 2. Thus F' can be chosen on D with 
at most support points, where 



2fc-2 



2fc-2 



+ ,.1 (2fc) 



(6.45) 

1=1 i=i ^ 

As a result, the first term in (I6.44p is canceled out. Now we want to bound 

the second term. Observe that \xi — Zi\ < M + ai < max(3L, vT8cT)(log_ e)'^°. 

Denote c = e^/^2^^/2(j-i ]2iax(3L, a/ISo"), then we bound the second term in 

f lCTj) by 

2(^=) ^(log_e)2^^ = 2(^=^) exp{-A;(logA;-21og(c(log_e)^°))} 

(6.46) 

By choosing k = [(1 + c^)(log_ e)^'^^\ + 1, the above term is bounded by a^^e. 

Without loss of generality, consider W = {\xi\ > M,\xi\ < Mi,i = 
2, . . . ,d}. Then 



sup \pfA^) -Pf'A^) 
w 



< 2<PAM-ai) sup 

i=2,...,d 



bAx,-z,)d{F-F') 



2'Ka 



(6.47) 
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Finally, combining results in fl6.44p . f l6.45p . fl6.46p and fl6.47p completes the 
proof. 

Now we prove the second part in a similar way as Lemma 2 of [llj. We par- 
tition intervals [— a^, into \2a^/a\ disjoint, consecutive subintervals of length 
cr and an interval of length less than a. Then D is divided into k = [2a^/a\'^ 
disjoint, consecutive and equally spaced regions Ji, . . . , Jfe and some final pieces 
that has area less than a'^. Applying the result from first part on [0, l]*^, we 
have approximation result as \\pf,(t ~ Pf', a Woo ^ cr^'^e while the number of 
support points of F' is bounded by a multiple of Hil'^e^^^ ^ l)(log_e)'^ < 
cr~'^(log_ Applying a multivariate version of Lemma 3.2 of [10], we get 

\\PF,a - PF',cr'\\l < \\PF,a -pF'.o-'lloomaX {a-^log. \\pF,a- - PF' , a' Woo'^'^ , Cla, l} 

< a''{a{\og_e)'/'w{\og_erYe. (6.48) 
Hence the proof is complete. □ 
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